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(57) Abstract 

Comparative Molecular Field Analysis (CoMFA) is an effective computer implemented methodology (Fig. 5) of 3D- 
QSAR employing both interactive graphics and statistical techniques for correlating shapes of molecules with their observed bio- 
logical properties. For each molecule of a series of known substrates the steric and electrostatic interaction energies with a test 
probe atom are calculated at spatial coordinates around the molecule. Subsequent analysis of the data table by a partial least 
squares (PLS) cross-validation technique (Fig. 8) yields a set of coefficients which reflect the relative contribution of the shape 
elements of the molecular series to differences in biological activities. Display (Fig. 3B) in three dimensions in an interactive gra- 
phics environment of the spatial volumes highly associated with biological activity and comparison with molecular structures 
yields an understanding of intermolecular associations. CoFMA will also predict the biological activity of new molecular species. 
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COMPARATTVE MOLECULAR FIELD ANALYSIS (CoMFA) 

Technical Field 

This invention relates to the method of comparing in three dimensions the 
steric and electrostatic fields exerted by molecules with similar binding affinities for 
5 a common molecule, and extracting by cross correlation of the fields, the most 
important common topological features related to the observed differences in 
binding affinities among those molecules. This method is particularly useful in 
understanding structure/function relationships in biological chemistry. 

Background Art 

10 During the past three decades modem biology has come to recognize the 

importance of the three-dimensional conformation/shape of biological molecules in 
relation to the observed function and activity of these molecules. Beginning with 
the first identification of alpha helical structures in proteins through the solution to 
the structure of DNA as a hydrogen bonded intertwined double helix to current 
15 studies by X-ray crystallography of enzyme-substrate complexes, appreciation of 
the role of shape as a determining factor has continually increased. In fact, it is 
now understood that a proper description and understanding of the functioning of 
most biological macromolecules is dependent on an understanding of the three- 
dimensional shape of the molecules. The situation is often analogized to that of a 
20 three-dimensional jigsaw puzzle, where the parts which must fit together interlock 
in specific patterns in three dimensions. It is now realized that the binding of a 
molecular substrate to an enzyme is determined by the ability of the substrate to fit 
a notch/groove/cavity within the enzyme in such a manner that the substrate is both 
mechanically and chemically stabilized in the correct three-dimensional and 
25 thermodynamic orientation to promote the catalytic reaction. Similarly, it has long 
been recognized that the highly specific binding of antibodies to antigens is 
accomplished by the recognition by the antibody of the surface shape specific 
features of the antigen molecule. 

Not only is the understanding of these three-dimensional puzzles 
30 important to a fundamental understanding of enzymology, immunology, and 

biochemistry, but such studies are of major interest to therapeutic drug researchers. 
Most drug effects are accomplished by the binding of a drug to a target receptor 
molecule. To the extent that the nature of the binding is more fully understood, it 
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should be possible to design drugs which bind to their target molecules with greater 
precision and effect than even naturally occurring compounds. This therapeutic 
quest is especially important in cancer research where the generalized side effects 
of many therapeutic drugs are undesirable and more specific drug interactions are 
5 desired* 

Along with the recognition of the importance of the three-dimensional 
stereo conformation of biomolecules has come an appreciation of just how difficult 
it is to understand how the conformation of the molecules is related to their 
activity At the present time, the only known method for determining exactly the 
10 three-dimensional shape of any biomolecule is by means of X-ray crystallography. 
While the number of biomolecules which have had their structure successfully 
determined by crystallography is growing rapidly, the total number remains 
relatively small, and an even fewer number have been studied in crystal form in 
conjunction with their bound substrates or ligands. Of the few Hgand-biomolecule 
15 combinations which have been successfully analyzed by X-ray crystallography, 
there is still the open question as to whether the complex exists in a different 
conformational combination in solution than it does in the crystallized form used for 
the study, although the evidence suggests that there is no major difference. 

The study of the three-dimensional conformation/shape of molecules is 
20 thus seen to be one of the core questions in modern molecular biology and 
biophysics. With the possible exception of the introduction in the not too far 
distant future of coherent X-ray lasers which may make the three-dimensional 
imaging of biological macromolecules considerably easier, there have been no 
fundamental advances in the instrumental techniques available during the last 
25 several years. Nor have recent advances in protein sequence determination, either 
by direct sequencing of the proteins or by sequencing of the precursor DNA 
molecules, been of much help in elucidating the three-dimensional structures since 
it was discovered early on that, due to the highly folded protein structure, amino 
acid side chains from vastly different sections of a protein are involved in the 
30 conformation of the receptor or binding site. Similar considerations are true with 
respect to antibody formation. Only recently has a proposal been made towards 
understanding the initiation of alpha helixes, perhaps the simplest tertiary protein 
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structure, based on a knowledge of the amino acid sequence in a protein. See 
Presta, L.G.; Rose, G.D. Science 1988, 240, 1632. 

Recognizing the difficult and lengthy time involved in obtaining X-ray 
crystallographic structures of biomolecules, researchers have pursued alternate, 
5 though less exact, paths towards obtaining information on the stereochemical 
binding of molecules. One such approach, taken by experimental chemists, has 
been to apply an understanding of basic chemical principles to analyze the likely 
binding sites of substrates. By examining the chemical structures of various ligands 
known to bind to a given protein, and relying on an understanding of generalized 

10 chemical and stereochemical principles, chemists have made educated guesses 
concerning which parts of the substrate/ligand would most likely be involved in 
binding to the protein. Based on these educated guesses, new compounds have 
been synthesized incorporating predicted reactive sites. The binding affinities of 
the new substrates for the desired protein have been measured. Some reasonable 

15 measure of success in understanding stereochemical binding has been achieved by 
this empirical method, but failure has been much more frequent than success. This 
scheme, though rational, is basically one of trial and error and does not lead to a 
coherent approach to finding or designing new molecules with the desired binding 
affinities. 

20 Attempts have been made over the years to place the understanding of 

stereochemical interactions of biomolecules and the development of new substrate 
molecules on a more quantitative footing. These approaches attempt to 
systematically relate differences in structures of similar substrate molecules to 
differences in their observed biological activities. Thus, a "structure activity 

25 relationship" (acronymed SAR) is sought for a given class of substrates/ligands. 
To the extent that these approaches have now been quantified, they are now 
referred to as "quantitative structure activity relationships" (acronymed QSAR). 
Generally, the relationship sought in formulating a QSAR is cast in the simplest 
possible format, that of a linear combination of elements. Thus the measured 

30 biological value, V, is sought to be explained by a series of terms, A, B, C, etc. as 

the linear combination: V = A+ B + C + The QSAR approach can be 

used to relate many measures/properties of molecules which are somehow reflective 
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' of their structure, such as partition coefficients and molar refractivity. In the past 
these indirect measures of shape have hecn used in QSAR studies since ustng dtrect 
m eas„res of shape proved conceptually and computationally difficult. As the ar, 
has progressed, and as the structural differences used in QSAR studies have b«ome 
xm»^ r o «4.u— ^i.;«i«icifinsi1 Quantitative 
5 primarily molecular shape differences, me neld o, 

structure activity relationships" (acronymeri 3D-QSAR) has evolved. 

The 3D-QSAR approach quantifies chosen shape parameters and tests to 
see if a correlation can be found between those parameters and a biological 
variable, typify binding affinity, ft has named out to be a very comply ^problem 
10 to model the interaction between a ligand and its receptor. The pnncrpal mfficulty 
has been finding a quantitative way in which to express the simple concept of 
shape As is often the case, what is vitally obvious to the human eye and bram ts 
complex » describe quantitatively or mattrematically. While describing shape « 
difficult enough, searching for similarities in shape using shape descriptors whrch 
15 are, at best, inadequate turns out to be exceedingly difficult. 

The general approach used in the QSAR methodology relies on the fact 
.hat, for most proteins, there are a number of chemical compounds or substrates 
having known strucutral differences which bind with differing affinities to me 
prorein. The rationale behind the 3D-QSAR approach is mat it should be possrble 
20 to derive shape descriptors which, when appfied to the various substrates, wfil 
reflect the different binding affinities. In 3D-QSAR a similar underlying 
assumption is made as in other QSAR approaches, i.e., mat tire relevant btologtca. 
parameter, usually a binding affinity, can be represented as a Unear con— of 
weighted contributions of tine various shape descriptors for the substrate molecules. 
25 Once a whole series of substrates are described with the same shape descriptors, tt 
should be possible to compare or correlate the shape descriptors and extract me 
critical shape determinants found to be associated with me differences in btologtcal 

activity amongst the substrates. 

From a knowledge of the most significant structural shape elements of the 
30 substrate or ligand, one could then infer the important elements of the receptor^ 
on the protein. Ideally, in this process one would have a, least as many substrates 
„ compare as one had variables among me shape descriptors. Thus, a system of 



WO 92/22875 ^ ^ PCT/US9 1/04292 

5 

equations with the number of equations equaling the number of shape descriptors 
with unknown weighting coefficients would exist and could be solved exactly. 
However, in practice, it quickly became evident that, even with simplifying 
assumptions, using available shape descriptors to describe the properties of an 
5 unknown shape, the number of descriptor variables far exceeds the number of 
available substrates for which binding data is known. Thus, rather than getting an 
exact solution, it was found that approximating statistical methods had to be used to 
extract from the numerical shape descriptors the shape elements which best 
correlated with observed biological activity. However, until very recently statistical 
10 methods were not available which could extract useful information from a system of 
equations containing many more variables than equations. 

During the past decade work has progressed in this field. From chemical 
analysis of substrate-protein complexes, it is known that the molecular interactions 
that produce an observed biological effect are usually non-covalent. Thus, the 
15 forces important for intermolecular association are believed to arise from 
hydrophobic, van der Waals (steric), hydrogen bonding, and electrostatic 
interactions. Attempts have been made to build shape descriptors based on these 
properties, but, unfortunately, the immense number of degrees of freedom and 
large labile protein-substrate complexes make the mathematical modeling of the 
20 shape of the complexes extremely difficult. Further simplifying criteria and 
assumptions were found to be necessary. 

One such approach, entitled the Molecular Shape Approach developed 
independently by Simon, et al. (see Simon, Z; Badilenscu, I.; Racovitan, T. J. 
Theor. Biol. 1977, 66, 485 and Simon, Z.; Dragomir, N.; Planchithin, M.G.; 
25 Holban, S.; Glatt, H; Kerek, F. Eur. J. Med. Chem. 1980, 15, 521) and 

Hopfinger, (see Hopfinger, A. J. J. Am. Chem. Soc. 1980, 102, 7196) compares 
net rather than location-dependent differences between molecules. That is, a shape 
characteristic of the total molecule is calculated in which the details of specific 
surface characteristics are merged into an overall molecular measure. The most 
30 active molecule in a series (in the sense of biological affinity) is considered to be a 
template molecule which has an optimal fit to the receptor site in the protein. 
Differences in activity amongst the series of substrate molecules are, therefore, 
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potentially correlated by . multiple regression analysis with three structure (or 
shape) parameters definable for each member of the series. The shape parameters 
initially considered were either: 1) tire common volume, 2) the volume possessed 
by the most active molecule, but not by the less active molecuie, and 3) tire volume 

. __4.;-.~ ^.nonia \r\ thft series. 

possessed by the less active but not by me most acuve 

Hopfinger describes these parameters as Common Overlap Steric Volumes and 
interpret them as quantitative measures of relative shape similamy. 

More recently Hopfinger (see Hopfinger, A. I. J. Med. Chem. 1983, 26, 
990) has constructed a new set of molecular shape descriptors derived from the 
potential energy field of a molecule. In titis approach, Hopfinger uses molecular 
Mechanics potentials as a means of estimating tire mo.ecu.ar potential energy fields: 



f* 

P u (.R, o, 40 - E 



15 



20 



25 



In this equation the molecular potential energy field P„(R, 9, 40 at any 
given point <R, », +) for molecuie u is defined; ad), and b(D, are toe attractive 
and repulsive potential energy coefficients, respectively, of atom i of molecule u 
interacting with the test probe T which is treated as a single force center, Q, and 
QCT) are, respectively, the charge densities of the ith atom and the test probe; s. 
is the dielectric term; n is the number of atoms in u, and » is the distance 
between atom i and the test probe. Hopfinger suggests that pairwise field- 
difference [APJ descriptors may correlate with biological parameters » a 3D- 
QSAR. Note, however, mat titis is a net molecuiar shape descriptor rather than a 
specific location-dependent shape descriptor. 

A second approach is the Distance Geometry Method of Cnppen. See 
for exampfc Ghose, A, Crippen, G. J. Med. Chem. 1985, 28, 333. In tins • 
approach Ore user must provide a -pharmacophore- or a lis. of potential receptor- 
binding atoms on each of the substrates/ligands having specified physicochemtctd 
properties. Knowledge of the pharmacophore comes from chemical studies of the 
binding properties of tire given series of substrate molecules. The user must al» 
provide a "binding site", a set of points in Cartesian space which are capable of 
interacting with a nearby pharmacophore atom, the magnitude of tire attraction or 
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repulsion depending on the nature of the atom. The geometrically allowed 
interactions between the ligand atoms and the binding site are determined. Each 
ligand is free to move or experience torsional deformations, in any fashion that 
minimizes the sum of its site points' energies of interaction with the "binding site". 
5 Thus, following Crippen, who again assumes a linear function for the interaction, 
the binding energy of a particular binding mode will be given by: 



where E c is the energy of the conformation; C's are the coefficients to be 
determined by quadratic programming; i' is the type of site i; n, represents the 
number of site pockets; n p represents the number of parameters to correlate with 
10 that site pocket interaction; n 0 represents the number of atoms occupying that site 
pocket; Pj represents the jth physiochemical parameter of the atom of type V 

A successful 3D-QSAR is found when the sum of the energies of 
interaction obtained is suitably close to the binding affinities observed 
experimentally. The result provides both a receptor map (the position and nature of 

15 the "binding site" points) and, for each member of the series, an active 

conformation of that molecule. In both the Hopfinger and Crippen approach, it 
will be noted that an initial educated guess must be made for the choice of the 
active conformation of the molecule before the analysis can be done, and Crippen 
must further hypothesize an actual receptor site map in three dimensions. 

20 Another major problem in any quantitative approach to shape analysis is 

the fact that, in solution, most compounds exist as a mixture of rapidly 
interequilibrating shapes or conformers. Generally, it is not even known which of 
the multiple conformations of a molecule is responsible for its measured biological 
affinity. Once again, educated guesses must be made to decide which of the many 

25 molecular conformations will be used in a 3D-QSAR analysis. The existence of 
multiple conformations further complicates the task of choosing the correct 
molecular orientation in which to make the comparison between the substrate 
molecules. Obviously, the ability of any shape measure to compare molecular 
shapes relies upon the correct relative orientation of the molecules when the shape 
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measure is first determined. The same molecule when compared to itself rotated by 
90" would not likely show any common structural features. Therefore, several of 
the 3D-QSAR methods rely upon alignment rules to guarantee that only the variable 
or differing parts of the molecules make the greatest contribution to the shape 
5 comparison. It is obvious that the existence of multiple conformations for a given 

molecule complicates this task. 

Typically then, a 3D-QSAR analysis starts out with many shape 
dependent parameters for a relatively few molecules whose biological activity, such 
as binding affinity, is known. This results in a series of linear relationships/ 
10 equations relating the shape parameters to the biological measures having many 
more unknowns (columns) than relationships (rows). Except in the limiting cases 
of shape descriptors where oversimplifying assumptions are made, no statistical 
regression or correlation methods were available until recently which could give 
any possible hope of solving such a set of equations. 
15 Disclosure Of Invention 

The present invention is an effective computer methodology employing 
both interactive graphics and statistical techniques for correlating shapes of 
molecules with their biological properties. The method of the present invention 
utilizes a new approach to 3D-QSAR which provides an objective and quantitative 
20 measure of the three-dimensional shape characteristics of all areas of a molecule 
and, at the same time, requires very few limiting assumptions. The quantitative 
description of the shape of the molecule is derived from an analysis of the steric 
and electrostatic interactions of the atoms comprising the molecule with a test 
probe. The resulting interaction energies calculated at all intersections in a three- 
25 dimensional grid or lattice surrounding the molecule form the quantitative shape 
descriptors entered along with the molecule's measured biological activity as a row 
in a data table. 

Each molecular conformation may be similarly described as a row of 
lattice point energies associated with the same measured biological activity. 
30 Selection of the conformers of choice can be made on either an empirical basis or 
by a weighted average, typically a Boltzman distribution of the various 
conformations. A row of interaction energies representative of the conformations 
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of a given molecule is then used. The resulting 3D-QSAR table typically has 
several thousand columns of lattice point energies and a number of rows 
corresponding to the number of molecules in the series being investigated. 

Theoretically, a complete description of the shape differences between the 
5 molecules under a study is contained in this table, but previously no statistical 
methodology was available to extract useful information from the table. Unless 
either limiting assumptions about binding sites are made which reduce the number 
of columns, or knowledge exists about the specific binding sites of a specific 
conformation, an infinite number of sets of coefficients can be calculated which 
10 would yield the same biological parameter values. Early in the 1980's a statistical 
methodology was derived which explicitly solves this type of multivariate problem. 
This methodology is called Partial Least Squares Analysis (PLS). 

The software of the present invention permits four different alternative 
procedures to be used to align the molecules in the three-dimensional lattice. They 
15 are: 1) a user specified alignment based on other data; 2) the Fit routine; 

3) the Orient routine; and, finally, 4) the Field Fit procedure which minimizes the 
differences in the calculated fields of the atoms between the various molecules. 
Preferably the alignment will be done by Field Fit. A 3D-QSAR table is generated 
and then analyzed according to the PLS method as modified for CoMFA. 
20 Resulting solution of the 3D-QSAR table yields coefficients of the column terms 
which represent the relative contributions of the various lattice positions to the 
biological activity. Since the solution is re-expressed in terms of interaction energy 
values similar to those that were calculated in creating the 3D-QSAR table, it is 
possible to reverse the process and display on a video terminal a plot of the 
25 interaction energies to reveal those areas of molecular shape associated with 

differences in biological activity. In an interactive graphics display environment, 
the invention allows the user to vary the analytical options and, in a reasonable 
time frame, see the areas of molecular shape most important to biological activity 
highlighted on the screen in front of him. By a study of the changing display as the 
30 parameters are varied, the user may obtain an understanding of how particular 
shape characteristics of the molecule are important to its biological activity. 
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It is a purpose of the present invention to compare the shapes of 
moleeules wnh shape descriptors highly sensitive to local surface area difference, 
to addition, it is a purpose of the present invention to provide a methodology for 
maki ng a quantitative estimate of tire importance of the various components of 
5 molecu.ar shape to the hiological activity of a molecule. A forther P**«-" 
prK e„t invention is «o provide structural, conformational, and s^socal mfonnaUon 
which wul allow users to suggest or identify new moiecules which nugh. be used as 
sobstrates/tigands. Finally, it is a purpose of the present invention » prov.de an 
interactive graphics environment in which me various parameters of shape can be 
10 studied in a quantitative fashion in order to obtain a more thorough towiedge of 
the nature of intermolecular interactions. 

Ttr ^f Descrir r ; "" " f drawings 
Figure 1 is a schematic illustration and overview of the CoMFA method. 
Figure 2 is a schematic Ulustration of the cross-validation procedure. 
15 Figure 3 A shows a scatter plot in three-dimensional lattice space of a 

steric CoMFA solution. . 

Figure 3B shows a contour plot in three-dimensional lattice space of the 

same CoMFA solution shown in Figure 3A. 

Figure 4A is the scatter plot of Figure 3A with a molecule superimposed 
20 so that the three-dimensional relationship of the molecule to the CoMFA solution 

can be seen- , 
Figure 4B is the contour plot of Figure 3B with a molecule supenmposed 

so that tire three-dimensional relationship of the molecuie » .he CoMFA solution 
can be seen. 

25 Figure 5 is a schematic illustration of the integration of the CoMFA 

software into a standard molecular modeling environment. 

Figures 6 through 9 are process flow charts illustrating the 
interrelationships of the major features of the present invention. 

Bgtailgd iwriprinn Of The Invention 
30 The present invention overcomes the limitations of earlier 3D-QSAR 

approaches and allows significant insights into molecular interactions never before 
achieved without actual X-ray crystallographic knowledge of the receptor binding 
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sites. In fact, to the extent X-ray results provide only a static picture, the present 
invention provides more detailed knowledge of the shape differences operative in 
dynamic interactions between molecules in solution. The Comparative Molecular 
Field Approach (CoMFA) is a heuristic procedure for defining, manipulating, and 
5 displaying the differences in molecular fields surrounding molecules which are 
responsible for observed differences in the molecules' activities. This description 
of CoMFA is arranged in two progressively more detailed sections: first, an 
overview of the entire process; and second, descriptions of the individual 
components including a rationale for each component and the differences with the 
10 prior art. 

CoMFA Overview 
Once a series of molecules, for which the same biological interaction 
parameter has been measured, is chosen for analysis, the three-dimensional 
structure for each molecule is obtained, typically from the Cambridge 
15 Crystallographic Database or by standard molecular modeling techniques. The 
three-dimensional structure for the first molecule is placed within a three- 
dimensional lattice so that the positional relationship of each atom of the molecule 
to a lattice intersection (grid point) is known. A probe atom is chosen, placed 
successively at each lattice intersection, and the steric and electrostatic interaction 
20 energies between the probe atom and the molecule calculated for all lattice 

intersections: These calculated energies form a row in a conformer data table asso- 
ciated with that molecule. 

Interaction energies for additional conformations of the first molecule 
may be similarly calculated. After each row of interaction energies is calculated 
25 for each conformer, the conformer may be aligned by a field fit procedure which 
minimizes the energy differences at each lattice point between that conformer and 
the first conformer. The field fit interaction energy values for each conformer are 
then entered into the data table for the first molecule. Once the interaction energies 
for all conformations of the first molecule have been calculated, an averaged value 
30 of the interaction energies at each lattice point of all the conformers becomes the 
first row in a 3D-QSAR data table associated with the measured biological 
parameter for the first molecule. 
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An identical procedure is Mowed for all the molecules in the series. 
Ate the averaged field values over the conformations for a parttcular molecule are 
determined, a field fit minimization of the averaged field values » the field values 
of an alligntnent molecule aligns the new molecule to the others in me senes. The 
5 upper portion of Figure 1 diagrammatical shows how the 3D-QSAB. able ts con- 
structed For each lattice intersection, the steric or electrostatic interaction energy 
with a test probe atom placed at the lattice point is entered into the appropnate 
st eric or ecstatic column associated with mat point. The intersection points are 
numbered sequentially, and the corresponding column identified as steric(S) or 

10 electrostatic (E). . r 

Once me data (interaction energies and measured biological acnvtty) for 

an molecules in me series are entered into the 3D-QSAR dam table, a Partial W 
Squares (PLS) analysis is performed which includes a cross-vatidation procedure. 
Using ure infraction energies for each lattice position and the biological values, tn 
15 essence, PLS solves a series of equations with more unknowns titan equations. As 
shown in the lower portion of Figure 1, the resulting solution is a senes of 

coefficients, one for each column, the value of which (in energy units) reflects .he 
contribution of the interaction energies a. that lattice position to differences in the 

measured biological parameters. 
20 While the solution has many terms, the one-to-one correspondence 

between a term and a lattice point allows the solution to be presented as an 
intense color-coded three-dimensional image, either in the form of a graph 
visually similar to the top part of Figure 1, (see Figure 3A) the color of a potnt 
signifying the magnitude of the corresponding terms, or better, with term values 
25 summarized in contour form (see Figure 3B). The graphic representation clearly 
shows the area in molecular space where the 3D-QSAR strongly associates changes 
in molecular field values with changes in the measured biological parameter. 

Molecular Force Fields 
As pointed out above, biochemists and biophysicists have come to believe 
30 mat intermolecular interactions are highly stereo specific, depending mainly on 
shape complementarity, and that biological molecules solve a three-dimenstonal 
jigsaw puzzle every time .hey bind. However, the prior art 3D-QSAR shape 
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descriptors mentioned earlier give only a net measure of overall shape, totally 
averaging out local topological differences between molecules. These methods are 
actually aggregate indices, which describe a "shape" only to the same extent, for 
example, that the comparative "shape" of two sculptures is described by measuring 
5 their differential weight or volume. Likewise, ball and stick molecular models do 
not reflect either the steric interactions of extended molecular orbitals or charge 
associated interactions. 

In order to describe molecular shape, a descriptor should be sensitive to 
at least three molecular parameters: first, it should account for the true steric bulk 
10 of each atom in the molecule; second, it should account for electrostatic interactions 
of each atom in the molecule; and, third, it should be a fine enough measure to 
reflect every local topological feature of the molecule. The approach used in 
CoMFA is that a suitable sampling of the steric and electrostatic interactions of a 
molecule will suffice to answer most questions about its possible shape dependent 
15 receptor interactions. The calculation of interaction energies at lattice points 
surrounding a molecule is not, in itself, new. Others have tried to use this 
approach as an estimate of molecular shape. For example, Goodford has described 
the use of probe-interaction "grids" similar to those calculated in the present 
invention. See Goodford, PJ. J. Med, Chem. 1985, 28, 849. 
20 In theory, the row of interaction energy data generated from all lattice 

points contains most of the information describing how a molecule "looks" to a 
receptor in three dimensions. However, before this invention, no one has 
discovered how to compare the shapes of different molecules represented by these 
rows of data or to extract and visualize useful information about the shape 
25 differences which are important to molecular associations. 

The fineness or resolution with which the shape of a molecule is 
described by this method depends on three factors: 1) the steric size of the test 
probe atom, 2) the charge on the test probe atom, and 3) the lattice spacing. The 
invention allows the user to specify both the steric size and charge of the test 
30 probe. In addition, the probe parameters may be varied at different lattice locations 
which the user believes calls for finer or coarser measurements. The user may also 
select the lattice spacing. Typically probe atom size is varied from that of covalent 
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„ ydlo gen -H, to sp 3 carbon, » sp> oxygen, .o diva,ent sulfcr. The probe ^charges 
used typically are +1.0 and 0.0, while lattice spacing values of 1.0 to 4.0 
angstroms are frequently employed. 

Van derWaals radii are generally used for the s.enc calculation and 
atomic charges can be calcuiated from knowledge of atomic coordinate, Thus, the 
steric interaction energy is calculated as: 



12 6 



10 



where N. is the number of atoms in the biomolecu.e; r, is the distance between the 
probe atom and the ith atom in the biomolecuie; and A, and B, are constants 
Lctetistic respectively, of the probe atom type and the type of the * -» - 
the biomolecule. fOther values can be selected by the user as an option tn p.ace of 
the exponent 12.] The electrostatic interaction energy is calculated as: 



2*r 2 

£=i r f 



15 



20 



where me N„ and r, are the same as for the steric calculation; Q 

tne probe atom; and „ is me chaxge on me ith atom. The , may be « 

•■• c». n„«eiser J • Marsili, M. Tetrahedron 
the method of Gasteiger and Mars.lt. See Qastetger, I. , M , 
1980, 36, 3219. [The user may omit the exponent 2 as a user option.] Smce the 
p^ atom is piaced successively a, ail latdce points, for those pom. wtthtn the 
.nolecuie me steric repuiston values can become enormous. Since mere » no 
seance to the absolute value other than «o estimate how much atom.c volume 
Lap exists, whenever the probe atom experiences a steric repulsio n g^t 
a -cutoff- value (30 Kcal/mole typically), the steric interaction ts set to the value 
■cutoff- and the electrostatic interactions are set to the mean of the other 
ntolecutes' electrostatic interactions at the same location. These cutoff values may 
1 be selected by the user of the invention. Obviously no topoiogical mformation 



is lost. 
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It should be recognized that any property which can be calculated from a 
molecular model, such as interatomic distances or torsion angles, can become an 
additional column in the 3D-QSAR table. Columns may also contain values of 
other orientation averaged molecular properties (such as logP or heats of 
formation), data defined as functions of other columns, or even data that is 
calculated via custom procedures provided by the user. In addition, since measured 
biological activity is a consequence both of a molecule's ability to get to the 
receptor site as well as to bind to the receptor, additional terms (columns) which 
reflect molecular diffusion may be incorporated. It should be appreciated that the 
statistical and visual correlation of data in the columns by the methods of the 
present invention is not limited to interaction energy shape descriptors. 

In fact, a most significant and powerful feature of the present invention is 
that the CoMFA method will yield information not even available from X-ray 
crystallographic studies since X-ray results present static pictures which are not 
totally dispositive of the dynamic interactions in solution. By comparison, the 
CoMFA model of the interaction is phenomenological. The actual measured 
activity is expressed or predicted in terms of determinable quantities. The present 
invention will display the dependence of the measured biological parameter on data 
(shape and other relevant information) contained in all columns. 

For development of the present invention, all positioning of molecules in 
the lattice was done with the SYBYL software program of Tripos Associates, Inc. 
However, there are several other programs available which are functionally 
equivalent and may be used with the present invention. Examples are: 



ChemX - from Chemical Design Ltd., Oxford, UK 
Insight - from BioSym Technologies, San Diego, CA 
Quanta - from Polygen, Waltham, MA 
ChemLab - from Molecular Design Ltd., San Leandro, CA 
MacroModel - from Prof. Clark Still, Columbia Univ. 



Such a host software program must support the building and storage of 
molecular models (retrieval of the atomic coordinates) plus the calculation of 
atomic charges (for electrostatic field computation) and the tabulation of steric 
parameters by atomic type (for steric field computation). 
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A1i]^tlTT"* n< " anH F ' eld Fit 

CoMFA works by comparing the interaction energy descriptors of shape 
and relating changes in shape to differences in measured biological activity. Since 
m shape descriptors axe calculated at each lattice point, the .attice stte-spectfic 
5 interaction energies calculated for the same molecute onse. b, ..... ------ 

poial will be significantly different. A CoMFA analysis of this da* wdl show 
difference* in shape where there are none, therefore, the positioning of a 
secular model within tire fixed lattice is by far ,he most important tnput vanable 
in CoMFA sine the reiative interaction energies depend strongly on relattve 
10 molecular positions within the lattice. 

The Field Fit feature of the present invention aligns molecules to 
' .^meir field differences rather than atomic coordinate difference. Since 
«» interaction energies reflect molecular shape, they can be quanutatrvely 
— ated for shape alignment. This is a particularly suitable approach stnee the 
15 intermodular comparisons are based on these same energy field, 

in Field Fit, any molecule may be used as the reference. However, rf 
fitting conformations of the same molecule, tire conformation which from other 
consLtions is most likely to be the most active conformer would usually be used 
as me comparison standard. When Field Fitting the final series of test modules, 
20 *e molecule with the greatest biological activity would usually be use* as the 

Terence. In the Field Fi, alignment, the root means scared (KMS) dtfferenee m 
the sum of aerie and electrostatic interaction energies averaged across all Uttce 
points, between the new moiecu.e and the reference molecule or se. of moiecules, 
is mininrked with respect to the six rigid-body degrees of freedom, any user- 
25 specified torsion angles, and any change in internal geometry. The user has me 
option before Field Fitting of weighting those lattice positions which he beueves 
from outer considerations may be particularly significant to the atignmen. of a 
given molecular series or conformation. The results of Field Fit alignments or test 
alignments using weighting factors may be dispiayed and compared visual y as 
30 .hree-dimensional scatter or contour plots in the same manner as discussed .ater for 
all graphic output. 
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With reference to the 3D-QSAR table of Figure 1, Field Fitting molecule 
2 to molecule 1 would correspond to minimizing the sum of squared differences 
between the values in all but the first column of the first and second rows of the 
table, by altering the position and/or torsion angles of molecule 2. Field Fit also 
5 requires for satisfactory results a steric repulsion beyond the lattice boundary and, 
when torsion angles are varied, the conventional molecular mechanics internal 
energy calculated using the same force field. The reason for the boundary steric 
repulsion is as follows. The function being minimized can be visualized as being 
similar in shape to the cross-section of a volcano. The steric boundary repulsion is 
10 needed because the answer sought in the minimization is the crater, but if the 
molecules are not nearly aligned or field-fit to begin with, the down-hill 
(minimization) direction will be .down the outside of the volcano: that is, 
minimization of the difference in fields will push the molecules apart. By placing 
steric repulsion at the edge of the lattice region, the down-hill direction along the 
15 outside of the volcano will be disfavored. 

Field Fit also allows the user to address the relative weighting of the 
three different contributions to the function being minimized, namely; the field 
difference itself, the edge steric repulsion, and the differing internal energies as 
torsional bonds and other internal geometries are altered. The weighting choice is 
20 a user option in the software program implementing the invention. The Field Fit 
ability to see in an interactive graphics environment the three-dimensional 
consequences of various weighting choices on molecular alignment is in itself a 
significant advance in 3D-QSAR. Minimization is performed by the Simplex 
method (a widely available algorithm), with step sizes such that individual atoms 
25 initially move no more than 0.2 Angstroms. The Simplex method is preferred 
because the function being minimized does not have analytical derivatives. 
Convergence occurs when successive function evaluations vary less than 1%. As 
with any minimization, Field Fit will find a best alignment if the final geometry is 
expected to closely resemble (be "downhill" from) the starting geometry. 
30 The CoMFA software programs implementing the invention allow other 

alignment procedures to be followed, such as the standard Fit and Orient routines. 
For instance, Fit utilizes least squares superposition of user specified sets of nuclei 
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of « with or without relaxation of internal geometry, while Orien. takes three 
user specified atoms and piaees me first atom a. the origin, the second atom along 
.he * axis, and the third atom in me xz p.ane. The user may even attempt mal and 
error alignment hased on an educated guess or outer 3D-QSAR date. Ftdd Ft, ,s 
5 particularly useful when a CoMFA based upon some other alignment method gets 
,oo low a cross-validated r>, caused in turn by one or more molecules having very 
,arge residuals (a very large difference between predicted and actual properties tn 
.he cross-validation step). A Field Fit of the compound© with very large restd^ls 
should produce a new alignment which will lead to improvement when the CoMFA 
10 is repeated. The conformed used in a CoMFA QSAR analysis may be aUtgned by 
any of these alignment procedures either before the CoMFA analysts or wtmtn the 
CoMFA analysis. 

The Field Fit procedure also has important applications when used to 
maximize rather man minimize field differences. If the differences in the 
15 interaction energies of two shape complementary molecules are maximtzed, Fteld 
Fit wfil produce the best three-dimensional alignment or "docking" between the 
molecules. Thus, if tire structures of bom the substrate and enzyme (or antigen and 
antibody) are known, Field Fit will find their optimal alignment. 

r eformation Selection 
20 A major unsolved problem in prior art approaches ,o 3D-QSAR is the 

determination of the proper molecular conformation to use in an analysis. Absent 
any direct knowledge of the actual active conformation responsible for biotogtcal 
activity, previously the only approach has been to make an educated guess. 
CoMFA, using Field Fit, allows a quantitative approach to conformation selection. 
25 It is possible using the CoMFA software programs to enter into a separate data 
.able *e interaction energies of every conformer and fit it to a selected template 
eonformer. Various averaging or weighting schemes can men be employed as user 
options to determine a most representative conformer. The interaction energtes for 
me various conformations can be weighted based on reasonable assumptions about 
30 the likelihood of certain conformations being most active without totally excluding 
contributions from presumably less active forms. In the alternative, since most 
conformations are beUeved to be equitibrated in free aqueous solution a, normal 
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temperatures, the CoMFA programs permit the weighting to reflect a Boltzman 
distribution over the energy of the conformers. Only in the case of a highly labile 



molecule (one possessing multiple rotomers and taiitomers) would a Boltzman 
distribution produce a fuzzy, averaged, and meaningless ball. CoMFA with Field 
5 Fit provides the ability to use these various weighting functions to determine a form 
of the molecule which the receptor site is most "likely" to see. 

PLS - Partial Least Squares Analysis 
As mentioned earlier, the inherently underdetermined nature of a 3D- 
QSAR table with many more columns than rows has previously posed an insolvable 
10 problem which prohibited use as a shape descriptor of the interaction energies 

calculated at thousands of lattice points. The values in the data table can be viewed 
as a system of equations with many more unknowns than equations. For instance, 
for three molecules the following three equations can be written: 

Value, = b 1 + AqoiS^OOI) +A OO2 S , (002)+....A N S l (N) + a 00l E , (001)-l-a 0O2 E , (002) + ....a N E t (N) 
Value^ = b 2 + Aoo,S 2 (001) +A <JO2 S 2 (002) + ....A N S 2 (N) + a wl E 2 (001)+a 302 E 2 (002)-h....a N E 2 (N) 

Value, = b 3 + A^S^OOl) +A (Xr S 3 (002) 
....A^^N) + a O0l E 3 (001)+a O02 E 3 (002) + ....a N E 3 (N) 



where the Values are the measured biological activities for each molecule; b x is the 
intercept for each equation for molecule x; A_ and a_ are the coefficients of the 
steric and electrostatic terms which reflect the relative contribution of each spatial 
location, the subscripts indicating both different coefficient values and the lattice 

25 positions with which the values are associated; S X (N) and E X (N) are the steric and 
electrostatic interaction energies calculated at lattice position N (where N ranges 
from 1 to the maximum number of lattice intersection points) for molecule x. The 
partial least squares (PLS) method of multivariate analysis "solves" this apparently 
underdetermined system of equations by a series of orthogonal rotations in 

30 hyperspace of both the independent and dependent variable matrices, in each 
rotation maximizing the commonality between the independent and dependent 
variable matrix. (In contrast, classical least-squares regression rotates the 
independent variable columns individually and independently, rather than together, 
thus consuming a degree of freedom for each coefficient estimated.) The solution 

35 to the equations found by PLS is the set of values of the coefficients which come 
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closest » malting each equation .rue. PLS is particularly attractive for CoMFA 
since it involves only two vector-matrix multiplications, can perform tire ca.cuia.ron 
on raw data, and can solve large problents on a smaller computer. 

An important improvement of PLS for use in CoMFA has been created > 
5 which me initial PLS solution is rotated back into the original data space thereby 
^expressing me term coefficients obtained as the solution in terms of the orrgrn* 
metric space (in this case, energy values). Since mis solution contains a potently 
^coefficient for each column in me daia lame, Cm fact two for each latirce 
point) it can therefore be displayed and contoured in mree-dimensional space, jus. 
10 L any other expression associating numerical values with known locations » 

SPaC6 ' integral to finding a "solution- by PLS is a cyclic cross-validation 
procedure. Cross-vatidation evaluates a model not by how weU it fits da*, bn, by 
how well it predicts data. While useful in many situations, cross-validauon rs 
^tical for vatidating the undetermined CoMFA 3D-QSAR table, A statistical 
ro easure of the reliability of the PLS solution is calculated by defining a cross- 
validated (or predictive) r> analogously to the definition of a conventional t as 
follows: 

cross-validated r 2 = SD - Press 

SD 

where SD is the sum over all molecules of squared deviations of each biological 
parameter from the mean and PRESS (predictive sum of squares) is the sum over 
aU molecules of me squared differences between the actual and predicted btologtc* 
peters. A negative cross-validated r> will arise whenever PRESS is larger than 
SD, that is, whenever the biological parameters are better estimated by the mean of 
all measured values than by the solution under consideration. 

The cross-validation procedure is integrated with PLS as follow, Frrst 
tire entire 3D-QSAR data table is analyzed by PLS and one component extracted tn 
hyperspace. CThe projection of this component onto aU tire orthogonal planes m 
hyperspace yields components on a,, the planes which are the equation coeffictents 
sought ] The PLS analysis is then repeated (the equation coefficients redenved) 
with a'randomly chosen molecu.e (row) excluded. The resulting coefficients are 
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used to calculate (predict) the biological value for the excluded molecule (row) and 
a new r 2 is calculated. [In actual practice the software program also permits the 
exclusion of random subsets of molecular values and calculations of the excluded 
biological values. This reduces the time necessary to compute a first set of coeffi- 
5 cients. In a full detailed analysis, each molecule (row) is individually excluded.] 
This omission, rederivation, and prediction procedure is repeated until every 
biological parameter value has been predicted by a set of coefficients from whose 
derivation it was excluded. Figure 2 shows a schematic outline of this cross- 
validation procedure. Note how the solution coefficients derived by PLS without 

10 the excluded row are used with the interaction energy values from the excluded row 
in the equation to predict the biological value of the excluded molecule. 

Values of r 2 and PRESS are calculated for each cross-validation cycle. If 
there is no correlation amongst the data, the coefficients derived will not give 
meaningful predicted values and the PRESS will exceed the SD. The r 2 values 

15 indicate how good the components are that result from the extraction. 

Next, the contribution of the first component already obtained is removed 
from the matrix hyperspace, a second PLS analysis performed, and an additional 
component extracted. Another cross-validation round is completed, again 
completing the omission, rederivation, and prediction cycle. The user specifies the 

20 number of times the extraction cross-validation procedure is repeated. The 

extracted components are added, rotated back into the data space, and the resulting 
coefficients generated. 

The outcome of the PLS/cross-validation analysis of the data table is a set 
of coefficients (one for each column in the data table) which, when used in a linear 

25 equation relating column values to measured biological values, best predict the 
observed biological properties in terms of differences in the energy fields among 
the molecules in the data set, at every one of the sampled lattice points. 
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Hra phic Display 

The final step in a CoMFA is the display of the analytical results in a 
manner meaningful to the biochemical researcher. In general, the human eye and 
brain are much more skillful in recognizing complex patterns within a picture than 
5 within a table of numbers. CoMFA outputs are uniquely able to utilize this 
inherent advantage in graphic presentations since the three-dimensionality of the 
input data is retained throughout. Indeed, the chemists who will use CoMFA are 
among the most visually oriented classes of scientists. Thus, in addition to its 
power, CoMFA is also much more graphically oriented than other 3D-QSAR 
10 approaches, in both its input requirements, (molecular models), and its output, 
(scatter plots and contour maps). Literally, the only number with which the end- 
user needs to be concerned is the cross-validated r 2 , the figure-of-merit for a 
CoMFA analysis. 

It should be evident that due to the manner in which the CoMFA 3D- 
15 QSAR methodology is structured, that is, as an attempt to relate differences in 
biological activity to differences in shape, the commonly shaped areas among the 
test molecules should not contribute strongly to the solution. Similarly, not all 
areas of shape difference should be reflected in larger contributions in the solution, 
but only those areas of shape difference related most strongly to the biological 
20 differences. A significant achievement of the present invention is that its solution 
to the 3D-QSAR interaction energy data table provides a quantitative comparison of 
molecular shape. Also, because the PLS solution was rotated back into the data 
set, the determined coefficients have the same units as the data values, and 
therefore, each term represents its contribution to functionality in the same units 
25 from which it was derived, i.e., interaction energies. In general, the larger the 
magnitude of a coefficient, the more strongly its associated spatial position is 
related to the observed biological differences. The sign of the coefficient is related 
to the sign of the effect of the change on the biological difference. 

Further, the terms in the solution are uniquely associated with positions 
30 in three-dimensional space (lattice coordinates) since the solution preserves the 

column structure of the data table. Therefore, a graphic plot in three dimensions of 
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the terms values (lattice point by lattice point) results in a display of the regions in 
space most responsible for predicting changes in molecular functionality. 

For comparison and study, several values representative of each term 
may be displayed for each point: 
5 1) the standard deviation of the column values times the 3D-QSAR 

coefficient; 

2) the 3D-QSAR coefficient only; 

3) the standard deviation of the column only; 

4) the column value for one of the molecules; 
10 5) the column value for a molecule times the 3D-QSAR coefficient; or 

6) any data from an external file. 
The values for steric and electrostatic terms may be displayed separately'or in 
combination. 

Two methods of graphic presentation are utilized. First the terms can be 
15 presented as a three-dimensional scatter plot color coded to represent the magnitude 
and sign of the association between the energy field change and biological activity 
at each lattice point. Thus, in Figure 3A, the blue dots represent solution 
coefficients whose values indicate that nearby increases in molecular size would 
increase molecular binding, while the yellow areas indicate that nearby increases in 
20 size would decrease molecular binding. The molecular modeling program used 
originally to place the molecules into the lattice may be used to superimpose any 
one of the molecules from the data set onto the three-dimensional display so that 
the colored areas of significance may be more easily identified with specific atomic 
positions as is shown in Figure 4A. 
25 The second method of viewing the information is to plot contours in 

space. The contour lines connect points (terms) in lattice space having similar 
values. The contours form polyhedra surrounding space where the values are 
higher or lower than a user selected cutoff value. The colored polyhedra in each 
map surround all lattice points where CoMFA strongly associates changes in field 
30 values with differences in the biological parameter. Figure 3B shows a contour 
plot by itself while Figure 4B shows a contour plot with a molecule from the data 
set superimposed for study and comparison. 
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The* displays clearly show the user where either increased steric talk or 
increased electrostatic interaction i„ a region is related to greater biological affinity. 
Conversely, also shown are .hose areas where increasing steric bulk or tncreasng 
efcctrostauc interaction interfer (negatively relate) with biological affimty. 

S One can view CoMFA maps not only as tnree-dimc.^— «r 

„, motecular shapes which are significantly reiated to biological functionality, but 
also as maps of the receptor spaces. In this view, the higher interaction areas 
reflect sterio specific orientation requirements of the receptor. The map of me 
steric terms gives an indication of the steric requirements of the receptor sue, and 
10 the map of the electrostatic terms gives an indication of me electrostauc 

requirements of the receptor site. When combined with chemical knowledge of the 
„ rite derived by other means, this information can lead to interesung and 
predictive insights into the nature of the receptor site. This method is clearly 
anguished ft-om the prior an, such as the distance geometry metitod tn flta. no 
15 guess as to specific locations of atoms at the receptor site is needed before tite 3D- 
QSAR is determined. In CoMFA, steric specific and eiectrostatic specific 
information about the receptor site is derived from the 3D-QSAR. Onecauhon 
m „ st be mentioned abou, over-interpreting the contour coefficient map as a receptor 
m ap. m a highly undetermined system such as is used in CoMFA win, many 
20 timea more coefficients ro be evaluated than compounds, a number of 3D-QSAR 
solutions to tite date set may exist equally consistent with any given set of 
compounds and date. While this does not diminish the predictive ability or soluuon 
found by the PLS/cross-validation method, it would suggest some caubon be 
nfflized in interpreting the final map as a receptor site map. 
25 Finally, the CoMFA map may be rotated and viewed from any destred 

angle in order to more thoroughly appreciate the space specific informauon tt 
contains. 

PrpHirtive Power 

A significant advance achieved by the present invention over the prior art 
30 is the ability to quantitatively predict the likely biological behavior °f a molecule 
no, included in the initial date set. A major impetus for developtng 3D-QSAR . * 
describe intermolecular associations, is .ha. such an understanding should enable the 
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design of molecules with even higher biological affinities than those presently 
known. One application of this ability would be the design of new and more 
powerful/selective drugs. In the prior art, to the extent that suggestions as to 
modified molecular structures could be made based on the results of QSAR 
5 analyses, it was necessary to synthesize the suggested molecule and test it in the 
relevant biological system before knowing whether a desired change had been 
achieved. By comparison, the present invention allows immediate testing of 
proposed molecular modifications against the CoMFA model-solution. Thus, based 
upon the spatial areas shown by CoMFA to be significant to biological activity, a 
10 new molecular configuration can be proposed. The proposed molecule can be 
placed and aligned in the lattice structure, its interaction energies calculated, and 
those energies entered into a 3D-QSAR equation using the coefficients derived from 
the original data table. The equation yields a predicted biological value for the 
molecule. 

15 The calculated interaction energies for the proposed molecule can also be 

displayed and compared to the initial CoMFA spatial maps. One can immediately 
see on the resulting display whether the changes made in the molecule design are 
associated with the same higher interaction energy terms and spatial areas as those 
predicted from the CoMFA. It has been found that the CoMFA methodology 
20 predicts with high accuracy the biological value of proposed molecules in those 

cases where the molecules have been synthesized/tested or were unknown when the 
CoMFA analysis was done. Thus, the present invention provides a quantitative 
process for investigating the structures of unsynthesized molecules to determine 
their likely biological activity. The importance of this ability to all aspects of 
25 medicinal and biological chemistry can hardly be overstated. 

In addition, the CoMFA methodology will permit the retrieval of 
molecules with desired structures from among data bases of molecules whose 
shapes are described by interaction energies. Indeed, it may be found that 
unsuspected molecules, never tested in a given biological system, may possess the 
30 proper shape to interact as well as or better than the known molecules. 

CoMFA results can also direct the user towards determining the actual 
conformer involved in the molecular interaction. As noted earlier, the final 
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CoMFA display identifies those spatial volumes most highly associated with 
differences in biological activity. The user may superimpose any of me molecular 
conformations, either as stick models or in interaction energy form, used m 
generating the 3D-QSAR .able onto me solution disp.ay to compare .he shape of 
5 that conformation to those critical spatial volumes. 

To an extent, some cotrelation must exist since the solution is derived 
ftom a able containing all conformations. Howevet, the conformation wh.ch 
comes closes, to matching me requirements of the solution space can be used as fte 
principal conformation in generating another 3D-QSAR table. If the predictive r s 
10 for the new table solution are higher than for .he first solution, me chosen 

conformation is more tikery to be ft. active conformation. This procedure may be 
' repeated as many times as the user feels necessary. 

I, will be possible with fte process of me present invention to substan- 
tially reduco me rial and error approach «o dreg design with consequent savings of 
15 time, energy, and money. Extensive use of CoMFA shouid also .end * more rap.d 
deveiopment of life-saving drugs. As mentioned earlier, CoMFA can be used as 
well for outer types of intermodular associations such as studies in anugen 
antibody binding and changes in .he receptor site of geneticaUy altered enzymes 
All that is needed is some knowledge of the molecular environment involved such 
20 as me x-ray cryaal stiucture of tine enzyme and knowledge of how substituted 

amino acids fit into the X-ray structure. The discussion of me present rnventton » 
terms of substrate-enzyme binding affinities should be understood to be 
representative of me utility of CoMFA as it is appreciated a. the moment, but not 
in any way limiting of tine generality of the methodoiogy or process disclosed m 
25 mis invention. Indeed, the spatial maps generated are such an extraordinary 

powerful tool in investigating intermo.ecu.ar associations mat it is beUeved the full 
import of the process is yet to be fully realized. 

TI . ili ^i"B CoMFA 

The present invention is intended to be utilized in conjunction with one of 
30 several moleoular modeling environments now commercially available. These 

environments have different hardware and display capabilities, as well as dtfferent 
software modalities available. Typically, however, the computing and display 
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functions equivalent to that found in the Evans and Southerland Series 300 
molecular modeling units would be most helpful in practicing the present invention. 

Figures 6 through 9 are software flow charts disclosing the CoMFA 
methodology of this invention. The inventor uses six sections of CoMFA specific 
5 software code: FFIT.C, EVAL.C, PLS.FOR, MAP.C, Q3DEF.C, and 

DABDEF.C. Figure 5 schematically shows how the six CoMFA specific software 
programs are integrated into a standard molecular modeling environment. FFIT.C, 
EVAL.C, PLS, and MAP.C are programs while Q3DEF.C is a data description of 
all data for which the CoMFA programs look, and DABDEF.C contains the global 
10 data structures for software which manages tables of numbers. As mentioned 
earlier, several programs are commercially available which can be used to build 
molecules of interest and their conformers into three-dimensional lattice space 
although the Tripos Associates, Inc. program SYBYL was used by the inventor. 
Similarly, while the inventor used the Tripos Associates, Inc. program DABYL to 
15 manage tables of numbers, functionally equivalent software includes RS/1 from 
BBN Software in Cambridge, MA. 

The methodology of the present invention provides for selection of input 
options/parameters by the user. The data structures for the input options/paramet- 
ers are specified in Q3DEF.C while DABDEF.C specifies data structures for the 
20 data management program. EVAL.C creates a 3D-QSAR table from the 

information provided by the molecular modeling program and from the input of 
biological parameters. FFIT.C performs the Field Fit alignment procedure to 
properly align molecules, either conformers or the molecules in the tested series. 
PLS performs the Partial Least Squares Analysis and cross-validation of the 3D- 
25 QSAR table created by EVAL.C. Finally; MAP.C generates the spatial maps for 
graphic output. If and when it is desired to superimpose a molecular structure on 
the output maps of the CoMFA methodology, the standard molecular modeling 
programs can be used to do so. 

The CoMFA software programs provide the user with a number of 
30 options for fully utilizing the power of the CoMFA invention. A complete list of 
options is given below: 
In FIELD FIT: 
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1) Calculations may be done either in "interactive" mode - progress 
monitored on terminal or "batch" - separately, with notification of the user when 
complete. 

2) Weighting of lattice points may be either: evenly; by QSAR 
5 coefficient; or by user-specified weights. 

3) The steric and electrostatic components may either be handled 

independently or summed together. 

4) Overall translations/rotations may be included or excluded. 

5) Torsional rotations may be included (if so, user to specify which 

10 ones) or excluded. 

6) How far should the molecule be moved in a trial move (initially - 

this value changes as Simplex minimization proceeds)? 

7) Convergence criterion - how small must the geometric change in 
successive steps be before field fit is considered done? 

15 8) Maximum number of steps before field fit quits, regardless of 

whether convergence has occurred. 

9) The template ("target") field may be from a single molecule 
(conformer) or from several molecules (conformers) averaged together. 

10) If in interactive mode (1 above), should the intermediate results be 

20 displayed after each 10 steps? 

11) If in interactive mode and displaying intermediate results, should 

the user be asked whether to continue after each display? 

12) Is this a regular field fit or else a "docking" field fit (where the 
objective is to maximize difference by field-fitting to complement of template 

25 field)? 

13) What should be done to save the result of the field fit? The opUons 
are: nothing, write to an external file, replace the molecule in the database. 

In EVAL.C: 

1) The type of alignment to be performed on the molecule 
30 conformers. The options are: none, FIT, ORIENT, field fit. 

2) Should the results of alignment be stored back into the database? 
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3) Should energy be "smoothed" (in which case the QSAR table value 
at a lattice point will be the average of the actual value and nine other values 
spaced evenly around that point)? 

4) Element/hybridization state of the probe atom (controls its steric or 
5 van der Waals properties). 

5) Charge of the probe atom (controls its electrostatic effect). 

6) Method of estimating van der Waals parameters (standard SYBYL 
method or calculated by Scott/Scheraga - reference to Scott/Scheraga is in the code 
itself). 

10 7) Repulsive van der Waals exponent value (usually 12). 

8) Electrostatic exponent (usually 2, equivalent to a 1/r dielectric). 

9) Maximum steric value to be recorded in the 3D-QSAR table 
(usually 30 kc/m). 

10) Highest energy conformation to consider when representing a 
15 molecule as the average of conformations (excessively high energy conformers 

make vanishingly small contributions to the overall shape). 

11) Should a 3D-QSAR table column be excluded if ANY compound in 
the QSAR table contributes a maximum steric value? 

12) Should identities of excluded 3D-QSAR columns (which occur 
20 whenever there is no difference in value along a column in the table, because all 

compounds have a maximum steric value at that lattice point) be listed to terminal? 
InPLS.FOR: 

1) Is cross-validation to be done? If so, the number of cross- 
validation groups. 

25 2) Is "bootstrapping" to be done? If so, the number of bootstrapping 

trials. 

3) The number of components to extract. 

4) Should data in individual columns be autoscaled (scaled so that the 
mean of values is 0.0 and the standard deviation is 0.Q)? (This particular procedure 

30 is not recommended with CoMFA but is included as a general procedure available 
for use with PLS.) 
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5) Is there any relative weighting of columns? (When including other 
properties such as log P, it is necessary to give them extra weight in order to 
compete with the large number of field descriptor columns.) 

6) Convergence criteria, specifically, epsilon, number of iterations, to 
5 be used within PLS itself. (A warning message printed if a round of PLS is ended 

by the number of iterations being exceeded rather than by a difference less than 
epsilon being obtained.) 
InMAP.C: 

1) What is the source of the 3-D data to plot, contour, or list? The 
10 options are (1) standard deviation of column times QSAR coefficient (2) standard 

deviation of column only (3) QSAR coefficient only (4) column value for an 
individual compound 

(5) column value for an individual compound times QSAR coefficient (6) external 
file. 

2) Which aspect of 3-D data to plot or contour? The choices are 
steric, electrostatic, or both steric and electrostatic in separate display areas. 

The CoMFA methodology has been discussed with reference to particular 
applications. However, application of the methodology to other areas should not be 
considered outside the scope of this disclosure. 
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Claims 

A computer-based method of generating and visualizing a three- 
dimensional quantitative structure activity relationship of a series of 
molecules comprising the steps of: 

(a) defining molecular shape descriptors for each molecule in said 
series of molecules wherein each molecule is associated with a 
unique parameter value; 

(b) aligning each molecule in said series with the common shape 
elements of all the molecules in said series; 

(c) correlating the molecular shape descriptors and unique parameter 
value of each molecule with all the other molecules in said series; 

(d) visually displaying using computer graphics the correlation among 
the molecules in said series. 

The computer-based method of Claim 1 wherein the shape of each 
molecule in Step (a) is defined by a means for calculating the steric and 
electrostatic interaction energies between a mathematical representation of 
a probe and the molecule at every intersection point of a lattice 
surrounding the molecule. 

The computer-based method of Claim 2 wherein in Step (b) each 
molecule is aligned by minimizing the root mean squared difference in 
the sum of steric and electrostatic interaction energies averaged across all 
lattice points between the molecule and the other molecules in the series. 
The computer-based method of Claim 3 wherein the correlation in Step 
(c) is performed by partial least squares analysis using cross-validation 
after each component extraction. 

The computer-based method of Claim 4 wherein the correlation among 
the molecules is visualized in Step (d) by displaying in three dimensions 
the correlation solution values corresponding to each point in lattice 
space. 

A computer implemented methodology for deriving a three dimensional 
quantitative structure activity relationship (3-D QSAR) among molecules 
each of which is associated with a measure of activity and whose basic 
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structures, including conformers, have been modeled in a three 
dimensional lattice comprising the following steps: 

a. selecting a first conformer of a first molecule; 

b. successively placing a mathematical representation of a probe 
of user specified size and charge at each lattice intersection; 

c. calculating the steric and electrostatic energies of interaction 
between the probe and the conformer at each lattice 
intersection; 

d. entering the steric and electrostatic interaction energies 
calculated in step c in a row of a data table identified with 
the conformer; 

e. selecting a next conformer of the first molecule and repeating 
steps b and c; 

f. aligning said next conformer to the first conformer; 

15 g . entering the interaction energies for said next conformer 

produced by the alignment as the next row in the date table 
identified with the conformer; 

h. repeating steps e through g for all conformers of the first 
molecule to be considered; 

i. weighting and then averaging the interaction energies across 
all conformers of the first molecule and placing the averaged 
interaction energies in the first row of a 3-D QSAR data 
table along with the measured activity value associated with 
the first molecule; 

j. repeating steps a through i for all molecules to be 

considered; 

k. aligning all molecules to said first molecule in the group 

being considered; 
1 extracting a first component by applying the partial least 
3Q ' squares statistical methodology to said 3-D QSAR data table; 
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m. performing a cross validation cycle on said 3-D QSAR data 

table using solution coefficients resulting from the first 

component extraction; 
n. extracting the next component by applying the partial least 

squares statistical methodology to said 3-D QSAR data table; 
o. performing a cross validation cycle on said 3-D QSAR data 

table using coefficients resulting from said next component 

extraction; 

p. repeating steps n through o until all desired components have 

been extracted; 
q. adding the extracted components; 

r. rotating the partial least squares solution consisting of the 
sum of the extracted components back into the original 
metric space; 
s. deriving the solution coefficients; and 
t. displaying the solution. 
The method of claim 6 further comprising the additional step of varying 
the size and charge of the probe by varying its mathematical 
representation in accordance with user specified criteria as it is placed 
successively at each lattice intersection. 

The method of claim 6 further comprising the additional step of varying 
the spacing of the lattice intersections in accordance with user specified 
criteria as the mathematical representation of the probe is placed 
successively at each lattice intersection. 

The method of claim 6 in which the alignment of the conformers is 
performed by the FIT method. 

The method of claim 6 in which the alignment of the conformers is 
performed by the ORIENT method. 

The method of claim 6 in which the alignment of the conformers is 
performed by the FIELD FIT method. 

The method of claim 6 in which the alignment of the molecules is 
performed by the FIT method. 
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13. The method of claim 6 in which the alignment of the molecules is 
performed by the ORIENT method. 

14. The method of claim 6 in which the alignment of the molecules is 
performed by the FIELD FIT method. 

The method of claim 6 in which the interaction energies of the 
conformers are weighted before averaging in accordance with a Boltzman 
distribution over the energies of the conformers. 

16. The method of claim 6 further comprising the additional step, before 
performing the partial least squares analysis, of placing in the columns of 
each row of a 3-D QSAR data table in addition to the interaction energies 
additional molecular parameters associated with the molecule represented 
by the row. 

17. The method of claim 6 in which the solution terms are displayed in three 
dimensional scatter plots corresponding to points in lattice space. 
The method of claim 17 further comprising the additional step of 
displaying a molecular model superimposed on the scatter plots. 

19. The method of claim 6 in which the solution terms are displayed in three 
dimensional contour plots defining volumes in lattice space. 

20. The method of claim 19 further comprising the additional step of 
20 displaying a molecular model superimposed on the contour plots. 

21. The computer implemented Field Fit method of aligning molecules 
according to their shapes where the molecular shape descriptors are the 
calculated molecular field values of steric and electrostatic interaction 
energies between each molecule and a mathematical representation of a 
probe sampled at all points in a three dimensional lattice in which the 
molecules are modeled, comprising the following steps: 

a. generating interaction energies which represent steric 
repulsion beyond the 3-D lattice boundary of said three 
dimensional lattice; and 

b. computing and minimizing the root mean squared difference 
in the sum of the steric and electrostatic interaction energies 
averaged across all lattice points between the molecule to be 
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aligned and a reference molecule with respect to the six 

rigid-body degrees of freedom. 
The method of claim 21 in which the minimization is performed by the 
Simplex method. 

The method of claim 21 further comprising the additional step of 
displaying the molecular alignment. 

The method of claim 21 further comprising the additional step of 
weighting the contributions to the minimization of those lattice positions 
which may be particularly significant to alignment of the molecules. 
The method of claim 24 further comprising the additional step of 
displaying the effect of various weighting choices on molecular 
alignment. 

The method of claim 21 further comprising the additional step of 
weighting the contributions to the minimization of the field differences 
and the edge steric repulsion. 

The method of claim 26 further comprising the additional step of 
displaying the effect of various weighting choices on molecular 
alignment. 

The method of claim 21 further comprising the additional step of 
minimizing the root mean squared difference in the calculated internal 
energies between the molecule to be aligned and a reference molecule as 
the torsion angles and internal geometry of the molecule to be aligned are 
altered within user defined limits. 

The method of claim 28 in which the minimization is performed by the 
Simplex method. 

The method of claim 28 further comprising the additional step of 
displaying the molecular alignment. 

The method of claim 28 further comprising the additional step of 
weighting the contributions to the minimization of those lattice positions 
which may be particularly significant to alignment of the molecules. 
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The method of claim 31 further comprising the additional step of 
displaying the effect of the various weighting choices on molecular 
alignment. 

The method of claim 28 further comprising the additional step of 

weighting the contributions to the minimization of the field differences, 

the edge steric repulsion, and the differing internal energies as torsion 

angles and internal geometries are altered. 

The method of claim 33 further comprising the additional step of 

displaying the effect of the various weighting choices on molecular 

alignment. 

The computer implemented method of aligning or docking shape 
complementary molecules where the molecular shape descriptors are the 
calculated molecular field values of steric and electrostatic interaction 
energies between each molecule and a mathematical representation of a 
probe sampled at all points in a three dimensional lattice in which the 
molecules have been modeled, comprising the following steps: 

a. generating interaction energies which represent steric 
repulsion beyond the boundary of said three dimensional 
lattice; and 

b. computing and maximizing the root mean squared difference 
in the sum of the steric and electrostatic interaction energies 
averaged across all lattice points between the molecule to be 
aligned and a complementary molecule with respect to the six 
rigid-body degrees of freedom. 

The method of claim 35 in which the maximization is performed by the 
Simplex method. 

The method of claim 35 further comprising the additional step of 
displaying the molecular alignment. 

The method of claim 35 further comprising the additional step of 
weighting the contributions to the maximization of those lattice positions 
which may be particularly significant to alignment of the molecules. 
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The method of claim 38 further comprising the additional step of 
displaying the effect of various weighting choices on molecular 
alignment. 

The method of claim 35 further comprising the additional step of 
weighting the contributions to the maximization of the field differences 
and the edge steric repulsion. 

The method of claim 40 further comprising the additional step of 
displaying the effect of various weighting choices on molecular 
alignment. 

The computer implemented method of determining the likely biological or 
chemical activity of a test molecule whose basic structure has been 
modeled in a three dimensional lattice by comparing its three dimensional 
shape to the shape of other molecules of known biological or chemical 
reactivity whose 3D-QSAR has previously been determined by the 
COMFA methodology, comprising the following steps: 

a. successively placing a mathematical representation of a probe 
of user specified size and charge at each lattice intersection; 

b. calculating the steric and electrostatic energies of interaction 
between the probe and the test molecule at each, lattice 
intersection; 

c. aligning the test molecule to the molecules in the molecular 
series used to derive the 3-D QSAR solution coefficients; and 

d. applying the solution coefficients derived in the 3-D QSAR 
COMFA analysis of the molecular series to the interaction 
energies of the test molecule to predict the biological or 
chemical parameter value which the test molecule should 
possess. 

The method of claim 42 further comprising the additional step of 
displaying the calculated interaction energies for the test molecule with 
the previously derived 3-D QSAR solution coefficients in order to 
visualize for comparison areas of similarity or difference. 
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44. The method of claim 42 in which the test molecule has not been 

synthesized and whose structure and that of its conformers is determined 
for purposes of placement in the three dimensional lattice from molecular 
modeling considerations or by molecular modeling techniques. 
5 45. The computer implemented method of generating and visualizing a three 
dimensional structure activity relationship among a group of molecules 
having related chemical or biological properties comprising the following 
steps: 

a. generating for each molecule in the group a row in a data 
1Q table consisting of molecular parameters uniquely associated 

with each individual molecule; 

b. performing a correlation of all the rows of data in the data 
table using the partial least squares statistical methodology 
including cross validation; 

15 c. rotating the solution back into the original metric space; and 

d. displaying the correlations among the molecules in the group. 
46. A computer implemented method for deriving the correlation between 

molecular descriptors and measured chemical or biological properties of a 
group of molecules where there are many more molecular descriptors for 
20 each molecule in the group than there are number of molecules in the 

group comprising the following steps: 

a. generating a data table each row of which contains in its 
columns the molecular descriptors associated with a single 
molecule of the group as well as the value of the measured 

25 chemical or biological property of that molecule; 

b. extracting a first component by applying the partial least 
squares statistical methodology to the rows of the data table; 

c. performing a cross validation cycle on the data table using 
solution coefficients resulting from the first component 

30 extraction; 

d. extracting the next component by applying the partial least 
squares statistical methodology to the rows of the data table; 
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e. performing a cross validation cycle on the data table using 
coefficients resulting from said next component extraction; 

f. repeating steps d and e until all desired components have 
been extracted; 

5 g. adding the extracted components; 

h. rotating the partial least squares solution consisting of the 
sum of the extracted components back into the original 
metric space and deriving the solution coefficients; and 

i. displaying the solution. 
10 47. A system for deriving a three dimensional quantitative structure activity 

relationship (3-D QSAR) among molecules each of which is associated 
with a measure of activity and whose basic structures, including 
conformers, have been modeled in a three dimensional lattice comprising: 
a. means for selecting a first conformer of a first molecule; 
15 b. means for successively placing a mathematical representation 

of a probe of user specified size and charge at each lattice 
intersection; 

c. means for calculating the steric and electrostatic energies of 
interaction between the probe and the conformer at each 

20 lattice intersection; 

d. means for entering the steric and electrostatic interaction 
energies calculated by means c in a row of a data table 
identified with the conformer; 

e. means for selecting a next conformer of the first molecule 
25 and invoking said means b and said means c; 

f. means for aligning said next conformer to the first 
conformer; 

g. means for entering the interaction energies for said next 
conformer produced by the alignment as the next row in the 

30 data table identified with the conformer; 

h. means for invoking means e through g for all conformers of 
the first molecule to be considered; 
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means for weighting and then averaging the interaction 
energies across all conformers of the first molecule and 
placing the averaged interaction energies in the first row of a 
3-D QSAR data table along with the biological or chemical 
5 value associated with the first molecule; 

j. means for invoking means a through i for all molecules to be 
considered; 

k. means for aligning all molecules to said first molecule in the 
group being considered; 
10 l means for extracting a first component by applying the 

partial least squares statistical methodology to said 3-D 
QSAR data table; 
m. means for performing a cross validation cycle on said 3-D 
QSAR data table using solution coefficients resulting from 
15 the first component extraction; 

n. means for extracting the next component by applying the 
partial least squares statistical methodology to said 3-D 
QSAR data table; 
o. means for performing a cross validation cycle on said 3-D 
20 QSAR data table using coefficients resulting from said next 

component extraction; 
p. means for invoking means n through o until all desired 

components have been extracted; 
q. means for adding the extracted components; 
25 r . means for rotating the partial least squares solution consisting 

of the sum of the extracted components back into the original 
metric space; 

s. means for deriving the solution coefficients; and 
t. means for displaying the solution. 
30 48. The system of claim 47 further comprising additional means for varying 
the size and charge of the probe by means for varying its mathematical 
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representation in accordance with user specified criteria as it is placed 
successively at each lattice intersection. 

The system of claim 47 further comprising additional means for varying 
the spacing of the lattice intersections in accordance with user specified 
criteria as the mathematical representation of the probe is placed 
successively at each lattice intersection. 

The system of claim 47 in which the means for aligning the conformers 
utilizes the FIT method. 

The system of claim 47 in which the means for aligning the conformers 
utilizes the ORIENT method. 

The system of claim 47 in which the means for aligning the conformers 
utilizes the FIELD FIT system. 

The system of claim 47 in which the means for aligning the molecules 
utilizes the FIT method. 

The system of claim 47 in which the means for aligning the molecules 
utilizes the ORIENT method. 

The system of claim 47 in which the means for aligning the molecules 
utilizes the FIELD FIT system. 

The system of claim 47 further comprising means for weighting the 
interaction energies of the conformers before averaging in accordance 
with a Boltzman distribution over the energies of the conformers. 
The system of claim 47 further comprising, before invoking the means 
for applying the partial least squares analysis, additional means for 
placing in the columns of each row of a 3-D QSAR data table, in addition 
to the interaction energies, additional molecular parameters associated 
with the molecule represented by the row. 

The system of claim 47 in which the means for displaying the solution 
further comprises means for displaying the solution terms in three 
dimensional scatter plots corresponding to points in lattice space. 
The means for displaying the solution of claim 58 further comprising 
additional means for displaying a molecular model superimposed on the 
scatter plots. 
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60. The system of claim 47 in which means for displaying the solution 
further comprises means for displaying the solution terms in three 
dimensional contour plots defining volumes in lattice space. 

61. The means for displaying the solution of claim 60 further comprising 

5 additional means for displaying a molecular model superimposed on the 

contour plots. 

62. The FIELD FIT system of aligning molecules according to their shapes 
where the molecular shape descriptors are the calculated molecular field 
values of steric and electrostatic interaction energies between each 

10 molecule and a mathematical representation of a test probe sampled at all 

points in a three dimensional lattice in which the molecules are modeled, 
comprising: 

a. means for generating interaction energies which represent 
steric repulsion beyond the boundary of said three 

15 dimensional lattice; and 

b. means for computing and minimizing the root mean squared 
difference in the sum of the steric and electrostatic 
interaction energies averaged across all lattice points between 
the molecule to be aligned and a reference molecule with 

20 respect to the six rigid-body degrees of freedom. 

63. The system of claim 62 in which the means for minimizing utilizes the 
Simplex method. 

64. The system of claim 62 further comprising additional means for 
displaying the molecular alignment. 

25 65. The system of claim 62 further comprising additional means for 

weighting the contributions to the minimization of those lattice positions 
which may be particularly significant to alignment of the molecules. 
66. The system of claim 65 further comprising additional means for 
displaying the effect of various weighting choices on molecular 

30 alignment. 
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The system of claim 62 further comprising additional means for 
weighting the contributions to the minimization of the field differences 
and the edge steric repulsion. 

The system of claim 67 further comprising additional means for 
displaying the effect of various weighting choices on molecular 
alignment. 

The system of claim 62 further comprising additional means for 
minimizing the root mean squared difference in the calculated internal 
energies between the molecule to be aligned and a reference molecule as 
the torsion angles and internal geometry of the molecule to be aligned are 
altered within user defined limits. 

The system of claim 69 in which the means for minimizing utilizes the 
Simplex method. 

The system of claim 69 further comprising additional means for 
displaying the molecular alignment. 

The system of claim 69 further comprising additional means for 
weighting the contributions to the minimization of those lattice positions 
which may be particularly significant to alignment of the molecules. 
The system of claim 72 further comprising additional means for 
displaying the effect of the various weighting choices on molecular 
alignment. 

The system of claim 69 further comprising additional means for 

weighting the contributions to the minimization of the field differences, 

the edge steric repulsion, and the differing internal energies as torsion 

angles and internal geometries are altered. 

The system of claim 74 further comprising additional means for 

displaying the effect of the various weighting choices on molecular 

alignment. 

The system of aligning or docking shape complementary molecules where 
the molecular shape descriptors are the calculated molecular field values 
of steric and electrostatic interaction energies between each molecule and 
a mathematical representation of a probe sampled at all points in a three 
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dimensional lattice in which the molecules have been modeled, 
comprising: 

a. means for generating interaction energies which represent 
steric repulsion beyond the boundary of said three 

5 dimensional lattice; and 

b. means for computing and maximizing the root mean squared 
difference in the sum of the steric and electrostatic 
interaction energies averaged across all lattice points between 
the molecule to be aligned and a complementary molecule 

10 with respect to the six rigid-body degrees of freedom. 

77. The system of claim 76 in which the means for maximizing utilizes the 
Simplex method. 

78. The system of claim 76 further comprising additional means for 
displaying the molecular alignment. 

15 79. The system of claim 76 further comprising additional means for 

weighting the contributions to the maximization of those lattice positions 
which may be particularly significant to alignment of the molecules. 

80. The system of claim 79 further comprising additional means for 
displaying the effect of various weighting choices on molecular 

20 alignment. 

81. The system of claim 76 further comprising additional means for 
weighting the contributions to the maximization of the field differences 
and the edge steric repulsion. 

82. The system of claim 81 further comprising additional means for 
25 displaying the effect of various weighting choices on molecular 

alignment. 

83. The system for determining the likely biological or chemical activity of a 
test molecule whose basic structure has been modeled in a three 
dimensional lattice by comparing its three dimensional shape to the shape 

30 0 f other molecules of known biological or chemical reactivity whose 3D- 

QSAR has previously been determined by the COMFA methodology, 
comprising: 
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a. means for successively placing a mathematical representation 
of a probe at each lattice intersection; 

b. means for calculating the steric and electrostatic energies of 
interaction between the mathematical representation of the 
probe and the test molecule at each lattice intersection; 

c. means for aligning the test molecule to the molecules in the 
molecular series used to derive the 3-D QSAR solution 
coefficients; and 

d. means for applying the solution coefficients derived in the 3- 
D QSAR COMFA analysis of the molecular series to the 
interaction energies of the test molecule to predict the 
biological -or chemical parameter value which the test 
molecule should possess. 

The system of claim 83 further comprising additional means for 

displaying the calculated interaction energies for the test molecule with 

the previously derived 3-D QSAR solution coefficients in order to 

visualize for comparison areas of similarity or difference. 

The system of claim 83 in which the test molecule has not been 

synthesized and whose structure and that of its conformers is determined 

for purposes of placement in the three dimensional lattice from molecular 

modeling considerations or by molecular modeling techniques. 

The system of generating and visualizing a three dimensional structure 

activity relationship among a group of molecules having related chemical 

or biological properties comprising: 

a. means for generating for each molecule in the group a row in 
a data table consisting of molecular parameters uniquely 
associated with each individual molecule; 

b. means for performing a correlation of all the rows of data in 
the data table using the partial least squares statistical 
methodology including cross validation; 

c. means for rotating the solution back into the original metric 
space; and 
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d. means for displaying the correlations among the molecules in 
the group. 

87. A system for deriving the correlation between molecular descriptors and 
measured chemical or biological properties of a group of molecules where 
5 there are many more molecular descriptors for each molecule in the 

group than there are number of molecules in the group comprising: 

a. means for generating a data table, each row of which 
contains in its columns the molecular descriptors associated 
with a single molecule of the group as well as the value of 

1Q the measured chemical or biological property of the 

molecule; 

b. means for extracting a first component by applying the 
partial least squares statistical methodology to the rows of the 
data table; 

15 c. means for performing a cross validation cycle on the data 

table using solution coefficients resulting from the first 
component extraction; 

d. means for extracting the next component by applying the 
partial least squares statistical methodology to the rows of the 

20 data table; 

e. means for performing a cross validation cycle on the data 
table using coefficients resulting from said next component 
extraction; 

f. means for invoking means d and e until all desired 
25 components have been extracted; 

g. means for adding the extracted components; 

h. means for rotating the partial least squares solution consisting 
of the sum of the extracted components back into the original 
metric space and deriving the solution coefficients; and 

3Q i. means for displaying the solution. 

88. A system for generating and visualizing a three dimensional quantitative 
structure activity relationship of a series of molecules comprising: 
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a. means for defining molecular shape descriptors for each 
molecule in said series of molecules wherein each molecule 
is associated with a unique parameter value; 

b. means for aligning each molecule in said series with the 
common shape elements of all the molecules in said series; 

c. means for correlating the molecular shape descriptors and 
unique parameter value of each molecule with all the other 
molecules in said series; and 

d. means for visually displaying using computer graphics the 
correlation among the molecules in said series. 

The system of claim 88 wherein the means for defining the shapes further 
comprises means for calculating the steric and electrostatic interaction 
energies between a mathematical representation of a probe and the 
molecule at every intersection point of a lattice surrounding the molecule. 
The system of claim 89 wherein the means for aligning each molecule 
further comprises means for minimizing the root mean squared difference 
in the sum of steric and electrostatic interaction energies averaged across 
all lattice points between the molecule and the other molecules in the 
series. 

The system of claim 90 wherein the means for correlating the shape and 
unique parameter value further comprises means for performing partial 
least squares analysis using cross validation after each component 
extraction. 

The system of claim 91 wherein the means for visually displaying the 
correlation further comprises means for displaying in three dimensions 
the correlation solution values corresponding to each point in lattice 
space. 

A computer based method of designing a molecule which will bind to a 
larger molecule which is known to bind other molecules with measured 
affinities comprising the following steps: 
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modeling in a three dimensional lattice the basic structures, 
including conformed, of molecules known to bind with 
measured affinities to the larger molecule; 
selecting a first conformer of a first molecule; 
successively placing a mathematical representation of a probe 
of user specified size and charge at each lattice intersection; 
calculating the steric and electrostatic energies of interaction 
between the probe and the conformer at each lattice 
intersection; 

entering the steric and electrostatic interaction energies 
calculated in step d in a row of a data table identified with 
the conformer; 

selecting a next conformer of the first molecule and repeating 
steps c and d; 

aligning said next conformer to the first conformer; 
entering the interaction energies for said next conformer 
produced by the alignment as the next row in the date table 
identified with the conformer; 

repeating steps f through h for all conformers of the first 
molecule to be considered; 

weighting and then averaging the interaction energies across 
all conformers of the first molecule and placing the averaged 
interaction energies in the first row of a second data table 
along with the measured activity value associated with the 
first molecule; 

repeating steps b through j for all molecules to be 
considered; 

aligning all molecules to said first molecule in the group 
being considered; 

extracting a first component by applying the partial least 
squares statistical methodology to said second data table; 
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n. performing a cross validation cycle on said second data table 
using solution coefficients resulting from the first component 
extraction; 

o. extracting the next component by applying the partial least 
5 squares statistical methodology to said second data table; 

p. performing a cross validation cycle on said second data table 
using coefficients resulting from said next component 
extraction; 

q. repeating steps o through p until all desired components have 
10 been extracted; 

r. adding the extracted components; 

s. rotating the partial least squares solution consisting of the 
sum of the extracted components back into the original 
metric space; 

15 t. deriving the solution coefficients; 

u. displaying the solution; and 

v. synthesizing a molecule with atoms arranged to occupy or 
not occupy, as is required, the three dimensional 
spaces/volumes indicated in the display as being critical to 
20 binding of the molecule to the larger molecule. 

94. A system for designing a molecule which will bind to a larger molecule 

which is known to bind other molecules with measured affinities 
comprising: 

a. means for modeling in a three dimensional lattice the basic 
25 structures, including conformers, of molecules known to bind 

with measured affinities to the larger molecule; 

b. means for selecting a first conformer of a first molecule; 

c. means for successively placing a mathematical representation 
of a probe of user specified size and charge at each lattice 

30 intersection; 
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means for calculating the steric and electrostatic energies of 
interaction between the probe and the conformer at each 
lattice intersection; 

means for entering the steric and electrostatic interaction 
5 energies calculated in step d in a row of a data table 

identified with the conformer; 

f. means for selecting a next conformer of the first molecule 
and repeating steps c and d; 

g . means for aligning said next conformer to the first 
10 conformer; 

h. means for entering the interaction energies for said next 
conformer produced by the alignment as the next row in the 
date table identified with the conformer; 

i. means for repeating steps f through h for all conformers of 
15 the first molecule to be considered; 

j. means for weighting and then averaging the interaction 
energies across all conformers of the first molecule and 
placing the averaged interaction energies in the first row of a 
second data table along with the measured activity value 
2 q associated with the first molecule; 

k. means for repeating steps b through j for all molecules to be 

considered; 

1. means for aligning all molecules to said first molecule in the 

group being considered; 
25 m . means for extracting a first component by applying the 

partial least squares statistical methodology to said second 

data table; 

n. means for performing a cross validation cycle on said second 
data table using solution coefficients resulting from the first 
3Q component extraction; 
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o. means for extracting the next component by applying the 
partial least squares statistical methodology to said second 
data table; 

p. means for performing a cross validation cycle on said second 
data table using coefficients resulting from said next 
component extraction; 
q. means for repeating steps o through p until all desired 

components have been extracted; 
r. means for adding the extracted components; 
s. means for rotating the partial least squares solution consisting 
of the sum of the extracted components back into the original 
metric space deriving the solution coefficients; 
t. means for displaying the solution; and 
u. means for synthesizing a molecule with atoms arranged to 
occupy or not occupy, as is required, the three dimensional 
spaces/volumes indicated in the display as being critical to 
binding of the molecule to the larger molecule. 
A molecule having an overall structure substantially similar to those of 
other molecules known to bind to a common molecule, and having shape 
determinants, which are designed to increase, or decrease said molecule's 
reactivity with said common molecule, in those regions where a 3D- 
QSAR indicates that changes in shape determinants are strongly 
correlated with an increase or decrease in reactivity with said common 
molecule, said 3D-QSAR being derived by a computer-based method 
comprising the steps of: 

(a) defining the shape of each molecule in a series of molecules 
which bind to said common molecule wherein each molecule 
is associated with a unique parameter value; 

(b) aligning each molecule in the series with the common shape 
elements of all the molecules in the series; 

(c) correlating the shape and unique parameter value of each 
molecule with all the other molecules in the series; and 
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(d) visually displaying using computer graphics the correlation 
among the molecules in the series. 
The molecule of Claim 95 wherein the shape of each molecule in Step (a) 
is defined by a means for calculating the steric and electrostatic 
interaction energies between a mathematical representation of a probe and 
the molecule at every intersection point of a lattice surrounding the 
molecule. 

The molecule of Claim 96 wherein in Step (b) each molecule is aligned 
by minimizing the root mean squared difference in the sum of steric and 
electrostatic interaction energies averaged across all lattice points between 
the molecule and the other molecules in the series. 
The molecule of Claim 97 wherein the correlation in Step (c) is 
performed by partial least squares analysis using cross-validation after 
each component extraction. 

The molecule of Claim 98 wherein the correlation among the molecules is 
visualized in Step (d) by displaying in three dimensions the correlation 
solution values corresponding to each point in lattice space. 
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A molecule having an overall structure substantially similar to those of 
other molecules known to bind to a common molecule, and having atoms 
arranged to occupy or not occupy, as is required to increase or decrease 
said molecule's reactivity with said common molecule, the three 
dimensional regions about said molecule where a 3D-QSAR indicates that 
changes in shape determinants are strongly correlated with an increase or 
decrease in reactivity with the common molecule, said 3D-QSAR being 
derived by a computer-based method comprising the steps of: 

(1) defining the shape of each molecule in a series of 
molecules which bind to said common molecule 
wherein each molecule is associated with a unique 
parameter value; 

(2) aligning each molecule in the series with the common 
shape elements of all the molecules in the series; 

(3) correlating the shape and unique parameter value of 
each molecule with all the other molecules in the 
series; and 

(4) visually displaying using computer graphics the 
correlation among the molecules in the series. 

The molecule of Claim 100 wherein the shape of each molecule in Step 
(1) is defined by a means for calculating the steric and electrostatic 
interaction energies between a mathematical representation of a probe and 
the molecule at every intersection point of a lattice surrounding the 
molecule. 

The molecule of Claim 101 wherein in Step (2) each molecule is aligned 
by minimizing the root mean squared difference in the sum of steric and 
electrostatic interaction energies averaged across all lattice points between 
the molecule and the other molecules in the series. 
The molecule of Claim 102 wherein the correlation in Step (3) is 
performed by partial least squares analysis using cross-validation after 
each component extraction. 
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104. The molecule of Claim 103 wherein the correlation among the molecules 
is visualized in Step (4) by displaying in three dimensions the correlation 
solution values corresponding to each point in lattice space. 

105. A molecule having an overall structure substantially similar to those of 
other molecules known to bind to a common molecule, and having shape 
determinants, which are designed to increase or decrease said molecule's 
reactivity with said common molecule, in those regions where a 3D- 
QSAR indicates that changes in shape determinants are strongly 
correlated with an increase or decrease in reactivity with the common 
molecule, said 3D-QSAR being derived by a computer-based method 

comprising the steps of: 

(a) generating for each molecule in the group a row in a data 
table consisting of molecular parameters uniquely associated 
with each individual molecule; 

(b) performing a correlation of all the rows of data in the data 
table using the partial least squares statistical methodology 
including cross validation; 

(c) rotating the solution back into the original metric space; and 

(d) displaying the correlations among the molecules in the group. 
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